Automated Curriculum Learning by Rewarding Temporally Rare Events

نویسندگان

  • Niels Justesen
  • Sebastian Risi
چکیده

Reward shaping allows reinforcement learning (RL) agents to accelerate learning by receiving additional reward signals. However, these signals can be difficult to design manually, especially for complex RL tasks. We propose a simple and general approach that determines the reward of pre-defined events by their rarity alone. Here events become less rewarding as they are experienced more often, which encourages the agent to continually explore new types of events as it learns. The adaptiveness of this reward function results in a form of automated curriculum learning that does not have to be specified by the experimenter. We demonstrate that this Rarity of Events (RoE) approach enables the agent to succeed in challenging VizDoom scenarios without access to the extrinsic reward from the environment. Furthermore, the results demonstrate that RoE learns a more versatile policy that adapts well to critical changes in the environment. Rewarding events based on their rarity could help in many unsolved RL environments that are characterized by sparse extrinsic rewards but a plethora of known event types.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stepping into Mindful Education: A Teacher Educator’s Narrative of Contextualizing a SLTE Curriculum

Initiation into contextualizing mindful second language teacher education (SLTE) has challenged teacher educators causing their retreat into mindless submission to ready-made standardized directives. To revive the starting perspective in curriculum development in light of the recent trend towards responsive SLTE, this practitioner research investigated how the context was incorporat...

متن کامل

Temporally remote destabilization of prediction after rare breaches of expectancy.

While neural signatures of breaches of expectancy and their immediate effects have been investigated, thus far, temporally more remote effects have been neglected. The present fMRI study explored neural correlates of temporally remote destabilization of prediction following rare breaches of expectancy with a mean delay of 14 s. We hypothesized temporally remote destabilization to be reflected e...

متن کامل

Gradient-Based Relational Reinforcement Learning of Temporally Extended Policies

1 We consider the problem of computing general policies for decision-theoretic planning problems with temporally extended rewards. We consider a gradient-based approach to relational reinforcement-learning (RRL) of policies for that setting. In particular, the learner optimises its behaviour by acting in a set of problems drawn from a target domain. Our approach is similar to inductive policy s...

متن کامل

Extending Workers' Attention Span Through Dummy Events

This paper studies a new paradigm for improving the attention span of workers in tasks that heavily rely on user’s attention to the occurrence of rare events. Such tasks are highly common, ranging from crime monitoring to controlling autonomous complex machines, and many of them are ideal for crowdsourcing. The underlying idea in our approach is to dynamically augment the task with some dummy (...

متن کامل

Inferring Temporal Ordering of Events in News

This paper describes a domain-independent, machine-learning based approach to temporally anchoring and ordering events in news. The approach achieves 84.6% accuracy in temporally anchoring events and 75.4% accuracy in partially ordering them.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018